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^ Abstract 

" We show that numerical data assimilation is feasible in principle for an 

idealized model only if an effective dimension of the noise is bounded; this 
effective dimension is bounded when the noises in model and data satisfy 
a certain natural balance condition. If this balance condition is satisfied, 
data assimilation is feasible even if the number of variables in the problem 
is huge. We then analyze several data assimilation algorithms, including 
particle filters and variational data assimilation. We show that a particle 
filter can successfully solve most of the data assimilation problems which 
, ^ , are feasible in principle, provided the particle filter is well designed. We 

also compare the conditions under which variational data assimilation can 
be successful with the conditions for successful particle filtering. We draw 
conclusions from our analysis and discuss its limitations. 

CN 1 Introduction 

cn 

Many applications in science and engineering require that the predictions of 
uncertain models be updated by information from a stream of noisy data 
(see e.g. [6||11|). The model and data jointly define a conditional probability 
• i-H density function (pdf) p(x°' n \z 1:n ), where the discrete variable n = 0, 1, 2, . . . 

can be thought of as discrete time, x n is a real m-dimensional vector to be es- 
£ timated, called the "state" , x°- n is a shorthand for X j X j • • • j X ^ and where 

the data sets z n are a A;- dimensional vectors (k < m). All the information 
we have about the state at time n is contained in this conditional pdf and a 



J: 



variety of methods are available for its study, e.g. the Kalman filter 18 , the 
extended and ensemble Kalman filter |13| , particle filters |11| , or variational 
methods [4,33 . Given a model and data, each of these algorithms will pro- 



duce a result. We are interested in the conditions under which this result is 
reasonable, i.e. consistent with the real-life situation one is modeling. 
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Generally, we restrict the analysis to linear state space models driven 
by Gaussian noise and supplemented by a synchronous stream of data per- 
turbed by Gaussian noise, i.e. the noisy data are available at every time step 
of the model and only then. We further assume that all model parameters 
(including the covariance matrices of the noise) are known, i.e. we consider 
the case of state estimation (not parameter estimation or combined state 
and parameter estimation). We study this class of problems because it can 
be examined in some generality and (we believe) can explain qualitatively 
its important aspects, but we also point out its limitations. 

In section 2, we examine what can be expected for our Gaussian model 
in principle, without regard to a specific algorithm. We define an effective 
dimension of a Gaussian data assimilation problem and show that, unless 
this effective dimension is moderate, the uncertainty in the model and the 
data is excessive so that making reliable conclusions about the underlying 
process is impossible. We argue that realistic problems have a moderate 
effective dimension. Investigating the role of the effective dimension in data 
assimilation algorithms is the subject of the remaining part of the paper. 
We briefly review particle filters in section 3. In section 4, we use the results 
of |30| to find the conditions under which the optimal particle filter (which 
in the linear synchronous case coincides with the implicit particle filter) per- 
forms well, and compare these conditions to what can be done in principle. 
We conclude that optimal particle filters can solve many data assimilation 
problems even if the number of variables to be estimated is large. Build- 



ing on the results in [2j[5,31 , another particle filter is shown to fail under 
realistic conditions. Thus, the implementation of particle filters is very im- 
portant, since a poor implementation can lead to a poor performance even if 
the effective dimension is small. In section 5 we consider particle smoothing 
and variational data assimilation and show that these methods can only be 
successful under conditions which are comparable to what we observed in 
particle filtering. We discuss limitations of our analysis in section [6] and 
present conclusions in section 7. 

To avoid confusion, we wish to point out here that the effective dimen- 
sion defined in this paper is different from the effective dimension introduced 



in |2l5 30,31 . The effective dimension in 2 , 5, 30 31 is connected to particle 



filters, whereas the effective dimension defined in this paper is a character- 
istic of the model and data stream, i.e. independent of the algorithm for 
data assimilation. We show in particular that the effective dimension (as 
defined here) is well-bounded for realistic models and that numerical data 
assimilation can be successful in these cases, even if the number of variables 
to be estimated is large (i.e. a moderate effective dimension in our sense can 
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imply a small effective dimension in the sense of [2j[5j[30j[3l] . 

2 The effective dimension of linear Gaussian data 
assimilation problems 

We consider autonomous, linear Gaussian state space models of the form 

x n+l =Ax n + w n ^ 

where n = 0, 1, 2, . . . is a discrete time, A n is a given m x m matrix and w n 
are independent and identically distributed (iid) Gaussian random variables 
with mean zero and given covariance matrix Q, which we write as w n ~ 
J\f(0,Q). The initial conditions may be random and we assume that their 
pdf is also Gaussian, i.e. x° ~ A/"(/Uo> ^a), with both and Eo given. We 
assume further that the data satisfy 

z n+1 = Hx n+1 +v n+ \ (2) 

where H is a given k x m matrix (k < m) and the v n+l ~ A/"(0, R) are iid, 
where R is a given k x k matrix. The u> n 's and v n, s are independent of each 
other. 

In principle, but not necessarily in practice, the covariance matrix P n 
of the state x n conditioned on the data z can be computed recursively, 
starting with P$ = Y>q: 

X n = AP n A T + Q, 

K n = X n H T {HX n H T + R)-\ 

Pn+l = {Im — K n H)X n , 

where I m is the identity matrix of order m and the m x k matrix K is 
often called the "Kalman gain". This is the Kalman formalism. Under 
suitable conditions on the ranks of the several matrices (specifically, if the 
pair (H, A) is d-detectable and (A,Q) is d-stabilizable, see [22], pp. 90-91), 
the covariance matrix reaches a "steady state" so that 

Pn+l =P n = P=(I- KH)X, 

where X is the unique positive semi-definite solution of the discrete algebraic 
Riccati equation (DARE) 

X = AXA T - AXH T (HXH T + R)~ 1 HXA T + Q. 
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Note that the steady state value of the covariance matrix is independent 
of the initial covariance matrix Eo and that the rate of convergence to this 
limit is at least linear, in many cases quadratic (see 22 , p. 313). 



This 



means that, after a relatively short time, the samples of the state given the 
data are normally distributed with mean \x n and covariance matrix P (the 
mean /x n of the variables is not needed here, but it can also be computed 
using Kalman's formulas). 

The Frobenius norm ||P||f = (Si 
(pij) determines how far these samples spread out in the state space. To 
see this, consider the random variable y = (x n — fJ. n ) T (x n — fx n ), where 
%n — ~ AT(0, P), i.e. consider the squared distances of the samples from 
their mean (their most likely value). Let U be an orthogonal m x m matrix 
whose columns are the eigenvectors of P. Then 



pfj) 1 ^ 2 °f the covariance matrix P 



T 
S S 



E4 



where s = U (x n — fi n ) ~ A/"(0, A), and A = U PU is a diagonal matrix 
whose diagonal elements are the m eigenvalues Xj of P. It is now straightfor- 
ward to compute the mean and variance of y because the Sj's (the elements 
of s) are independent : 



Note that y 



where r is the distance from the sample to the most 



likely state (the mean). Assuming that m is finite (but large), we obtain, 



using Taylor expansion of r / yJ^Aj 
assuming that Xj = 0(1), that 
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Using the techniques in [5] , one can show that for m — > oo and ^ A — > oo 
and with Xj = 0(1) (i.e. in the case for which the moments of y do not 
necessarily exist) the above expressions become "Asmean(r)" respectively 
u Asvar(r)" , i.e. the moments of limiting Gaussian variables. 

Now suppose that m is large but finite and that Xj = O(l) for j = 
1, ... ,m; then E = 0{m 1 / 2 ) and v = O(l). This means that the samples 
collect on a shell of thickness 0(1) at a distance 0(m l l 2 ) from their mean 
and are distributed over a volume 0(m^ m+1 ^ 2 ), i.e. the predictions spread 
out over a large volume at a large distance from the most likely state. How- 
ever, the data assimilation problem reflects an experimental situation, and 
the numerical samples should behave just like experimental samples: if the 
uncertainty is large, one will observe that the outcomes of repeated experi- 
ments exhibit a large spread; if the uncertainty is small, then the spread in 
the outcomes of experiments is also small. Since the outcomes of repeated 
experiments rarely exhibit large variations, one should expect that the sam- 
ples of numerical data assimilation all fall into a small "low-dimensional" 
ball, centered around the most likely state, i.e. the radius, E(r) « E, is 
comparable to the thickness, var(r) ~ v. Data assimilation only makes 
sense in this case because, otherwise, the samples spread out over a huge 
volume and there is not enough information in the model and data to make 
reliable conclusions about the state. 

Standard inequalities show that 



\ 3=1 3=1 \ 



and now one can obtain upper bounds for E and v: 

( m \ 1/4 / - ^ 1 2 



These upper bounds imply that the sum of the eigenvalues squared of the 
steady state covariance matrix P, i.e. its Frobenius norm, determines the 
mean and variance of the distance of a sample from the most likely state, i.e. 
the spread of the samples in the state space. We thus define the effective di- 
mension of the Gaussian data assimilation problem defined by equations ([TJ 

and (p) to be the Frobenius norm of P, ||-P||f = \Jlil, ^f- 

Note that this effective dimension is different from the definitions in 
|J§[30K3l], which are defined in connection with specific particle filters. 



5 



The effective dimension denned here is independent of a data assimilation 
technique; it is a characteristic of the model ([I]) and data stream ([2]). We 
expect the effective dimension to be bounded in practice and corroborate 
this point in the next two sections. 

Finally we want to point out that we study the posterior pdf (the pdf 
of x conditioned on the data); this pdf can be calculated with the Kalman 
filter formulas, which however are valid only for linear Gaussian models as 
in ([I]) and ([2]). Nonlinear or non-Gaussian models are not discussed here, 
but we mention the limitations of our analysis in more detail in section [6| 

2.1 Bounds on the effective dimension 

To discover the real-life interpretation of the effective dimension, we study its 
upper bounds in terms of the Frobenius norms of Q and R. From Khinchin's 
theorem [8] we know that the Frobenius norms of Q and R must be bounded 
or else the energy of the noise is infinite (which is unrealistic). Here, we 
show that a small Frobenius norm of Q and R can lead to a small effective 
dimension. We start by a simple example, which is also useful in the study 
of data assimilation methods in later sections. 



2.1.1 Example 

Put A = H = I m and let Q = ql m , R = rl m . The Riccati equation can be 
solved analytically for this example and we find that the Frobenius norm of 
the steady state covariance is 

,— yJq 2 +^qr - q 
r f = x/m— . 



In a real-life problem, we would expect ||-P||f to grow slowly, if at all, when 
the number of variables increases. The boundedness of the effective dimen- 
sion induces a "balance condition" between the errors in the model (repre- 
sented by q) and the errors in the data (represented by r). In this simple 
example, this balance condition is the inequality 

yV + Aqr - q 1 



m 



where the 1 in the numerator of the right-hand side stands for a constant 
O(l), or even for a function of m that grows slower than \fm\ we set this 
constant equal to 1 because this already captures the general behavior. Fig- 
ure [l] illustrates the balance condition and shows a plot of the function that 
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is defined by the left-hand-side of the above inequality as well as three level 
sets, corresponding to m = 5, 10, 100 respectively; for a given dimension m, 
all values of q and r below the corresponding level set lead to an O(l) effec- 
tive dimension. 




0.2 0.6 1.0 1.4 1.8 

q 



Figure 1: Covariance map for sequential data assimilation. 



The balance condition in this model problem implies that the smaller 
the errors in the data (r), the larger can be the uncertainty in the model (q) 
and vice versa, and, for large m, only small values of q and r are allowed. 
Moreover, note that for very small q, the boundaries for successful data 
assimilation are (almost) vertical lines. The reason is that if the model is 
very good, neither accurate nor inaccurate data can improve it, i.e. data 
assimilation is not necessary. If the model is poor, only nearly perfect data 
can help. We will encounter this balance condition (in more complicated 
forms) again in the general case in the next section and also in the analysis 
of particle filters and variational data assimilation. 

Finally, note that the Frobenius norms ||Q||f = qyfrn and ||-R||f = Tyjm 
increase with the number of dimensions unless q or r or both decrease with 
m as shown in figure [TJ We will argue in section 2.2 that in realistic cases, 
the Frobenius norms of Q and R remain bounded as m increases. We also 
expect, but cannot prove in general, that a balance condition as in figure [T] 
is valid in the general case (arbitrary A, H, Q, R), with q and r replaced by 
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the Frobenius norms of Q and R. 
2.1.2 The general case 

In the general case, the balance condition between the uncertainties in the 
model (||Q||f) and data is more complicated because the effective 

dimension is the Frobenius norm of the solution of a Riccati equation which 
in general does not admit a closed form solution. 

However, if the covariance matrices Q and R have small Frobenius norms, 
then the effective dimension of the problem can be small and data assimila- 
tion can be successful. To see this, let X and P be the solution of the DARE 
respectively the steady state covariance matrix of a given (A, Q,H,R) data 
assimilation problem and let Q < Q, i.e. Q — Q is symmetric positive 
semi-definite (SPD). If R < R, then, by the comparison theorem (Theorem 
13.3.1) in 1 22 1, X < X, where X is the solution of the DARE associated with 
the (A, Q, H, R) data assimilation problem. From the Kalman formulas we 
know that 

P = X — XH T (HXH T + Ry 1 HX, 

which implies that P < X. Moreover, for two SPD matrices C and D, 
C < D implies ||C||.f < Thus, the smaller the Frobenius norm of Q 

and R, the smaller is the upper bound ||A||^ on the effective dimension. 

However, the requirement that these Frobenius norms be small is not 
sufficient to ensure that the effective dimension of the problem is small; 
in particular, it is evident that the properties of A must play a role; for 
example, if the L2 norm of A exceeds unity, the model ([I]) is unstable and 
successful data assimilation is unlikely. 

While the model, or A, is implicitly accounted for in X, the solution 
of the DARE, one can construct sharper bounds on the effective dimension 
by accounting for the model and data stream ([2]) more explicitly. To 
that extend, we construct matrix bounds on P, from matrix bounds for the 
solution of the DARE [21]. Let X < X u , and Xi < X, be upper and lower 
matrix bounds for the solution of the DARE, for example, we can choose 
the lower bound in [20] 

Q < Xi = AiQ- 1 + H T R- l H)- 1 A T + Q<X, 

and the upper bound in [2l] 

X < X u = A(X~ l + H T R- l H)- l A T + Q, 



S 



where X, = A(r)- 1 +H T ir 1 H)- 1 A T +Q, 77 = f(-X 1 (A)-X n (H T R- 1 H)X 1 (Q)+ 
l,2A n (i? T i?- 1 ,H"),2Ai(Q))), f(a,b,c) = (Va^+bc - o)/2) and A X (C) and 
A n (C) are the largest respectively smallest eigenvalue of the matrix C. Then 
an upper matrix bound for the steady state covariance matrix is 

P<X U - X l H T (HX u H T + R^HX^ 

The Frobenius norm of this upper matrix bound is an upper bound for the 
effective dimension. 



2.2 The real- world interpretation of effective dimension 

We have shown that there is little hope for making reliable conclusions 
unless the effective dimension of the data assimilation problem defined by 
equations and ([2]) is small. We now give more detail about the physical 
interpretation of this result. 

Suppose the variables x one is estimating are point values of, for example, 
the velocity of a flow field (as they often are in applications). The Frobenius 
norm of the covariance matrix Q is proportional to the specific kinetic energy 
of the noise field that is perturbing an underlying flow. This energy should 
be a small fraction of the energy of the flow, or else there is not enough 
information in the model ([!]) to examine the flow one is interested in. We 
can thus assume that the Frobenius norm of Q is (much) less than m. By 
the same arguments, we can assume that the Frobenius norm of R is small, 
or else the noise in the data equation overpowers the actual measurements. 
Since small Frobenius norms of Q and R often imply a small Frobenius norm 
of P, we are dealing with a data assimilation problem with a small effective 
dimension. 

Point values of a flow field typically come from a discretization of a 
stochastic differential equation. As one refines this discretization, one can 
expect the correlation between the errors at neighboring grid-points to in- 
crease. These errors are represented by the covariance matrix Q and from 
Khinchin's theorem (see e.g. [8]) we know that a random field with suffi- 
ciently correlated components has a finite energy density (and vice versa). 
This implies that the Frobenius norm of Q does not grow without bound as 
we increase m. 

Another and perhaps even more dramatic instance of this situation is 
one where the random process we are interested in is smooth so that the 
spectrum of its covariance matrix decays quickly [I , 28 . For practical pur- 



poses one may then consider m — d of the eigenvalues to be equal to zero 
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(rather than just very small). This is an instance of "partial noise" [26], i.e. 
the state space splits into two disjoint subspaces, one of dimension d, which 
contains state variables, u, that are directly driven by Gaussian noise, and 
one of dimension m — d, which contains the remaining variables, v, that are 
(linear) functions of the random variables u. Thus, the steady state covari- 
ance matrix is of size d x d and the effective dimension is independent of the 
state dimension and moderate even if m is large. 

Note that the key to the small effective dimension in the above cases 
is the correlation among the errors and indeed, the data assimilation prob- 
lems (or covariances Q and R) derived by various practitioners and theorists 
show a strong correlation of the errors (see e.g. [^[§[^[24}|2^[28j[29j[34}|40] ) . 
The correlations are also key to the well-boundedness of infinite dimensional 
problems 32 where the spectra of the covariances (which are compact op- 
erators in this case) decay; a well correlated noise model was obtained from 
an infinite dimensional problem in [3j[7]. 

The geometrical interpretation of this situation is as follows: because 
of correlations in the noise, the probability mass is concentrated on a d- 
dimensional manifold, regardless of the dimension m > d of the state space. 
In addition one must be careful that the noise in the observations not be 
too strong. Otherwise the data can push the probability mass away from 
the d-dimensional manifold (i.e. the data increase uncertainty, instead of 
decreasing it). This assumption is reasonable, because typically the data 
contain information and not just noise. 

Next, suppose that the vector x in ([I]) and Q represents the components 
of an abstract model with the several components representing various indi- 
cators, for example of economic activity (so that the concept of energy is not 
well-defined). It is unreasonable to assume that each source of error affects 
only one component of x. As an example of what happens when each source 
of error affects many components, consider a model where Gaussian sources 
of error are distributed with spherical symmetry in the space of the x's and 
have a magnitude independent of the dimension m. In an m dimensional 
space, the components of the unit vector of length 1 have magnitude of or- 
der l/yjm, so that the variance of each component must decrease like 1/m. 
Thus, the covariance matrices in ([!]) and ^ are proportional to m~ 1 I m and 
the effective dimension (for A = H = I m ) is ||-P||f = (v^ - l)/2m, which is 
small when m is large. This is a plausible outcome, because the more data 
and indicators are considered, the less uncertainty there should be in the 
outcome (because the new indicators provide additional information). 
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3 Review of particle filters 

In importance sampling one generates samples from a hard-to-sample pdf p 
(the "target" pdf) by producing weighted samples from an easy-to-samplc 
pdf, 7r, called the "importance function" (see e.g. [8j[l9]). Specifically, if 
the random variable one is interested in is x ~ p, one generates samples 
Xj ~ 7r,j = l,...,m, (we use capital letters for realizations of random 
variables) and weighs each by the weight 



TT(Xj) 

The weighted samples {Xj,Wj} (called particles in this context) form an 
empirical estimate of the target pdf p, i.e. for a smooth function u, the sum 

m 

E m (u) = Y,u(X J )W J , 

3=0 

where Wj = Wj/ YljLoWj, converges almost surely to the expected value 
of u with respect to the pdf p as m — > oo, provided that the support of ir 
includes the support of p. 

Particle filters apply these ideas to the recursive formulation of the con- 
ditional pdf: 

nf^n+l \r r n\ ri ( ~n+l I rr n+l\ 

p(/ n+1 |z 1: " +1 )=p(i 0:n |^)^ g gf. f 1 

yy i p(z n+1 \z lm ) 

This requires that the importance function factorize in the form: 

71+ 1 



Tr(x 0:n+1 \z°-- n+1 ] 



7T0(x°)n^V :fc "V :fc )- (3) 



k=l 



where the Hk are updates for the importance function. The factorization of 
the importance function leads to the recursion 



t p{X?+ x \X* )p(Z n+1 \X? +1 • 



W n+1 oc W n 3 3 - (4) 

3 3 n n+l (X^\Xf--,Z^) ' 



for the weights of each of the particles, which are then scaled so that their 
sum equals one. Resampling after every step makes it possible to set = 
1/m when one computes W™ +1 (see e.g. [ll])- Once one has set = 1/m 
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but before sampling, each of the weights can be viewed as a function of the 
random variable and is therefore a random variable. 

The weights determine the efficiency of particle filters. Suppose that, 
before the normalization and resampling step, one weight is much larger 
than all others; then upon rescaling of the weights such that their sum 
equals one, one finds that the largest normalized weight is near 1 and all 
others are near 0. In this case the empirical estimate of the conditional pdf 
by the particles is very poor (it is a single, often unlikely point) and the 
particle filter is said to have collapsed. The collapse of particle filters can be 
studied via the variance of the logarithm of the weights, and it was argued 
rigorously in [2,5,30 31 that a large variance of the logarithm of the weights 



leads to the collapse of particle filters. The choice of importance function tt 
is critical for avoiding the collapse and many different importance functions 
have been considered in the literature (see e.g. [9| [Io|[27|[35}|38] ). Here we 
we follow (2|[5| |30[[3l] and discuss two particle filters in detail. 



3.1 The SIR filter 

A natural choice for the importance function is to generate samples with 
the model 0, i.e. to choose ir n+ i = p{x n+l \x n ). When a resampling step is 
added, the resulting filter is often called a sequential importance sampling 
with resampling (SIR) filter |15| and its weights are 

W™ +1 otp{Z n+1 \Xf +1 ). 

It is known that the SIR filter collapses if the importance function 7r n+ i = 

p(x n+1 \x n ), called the "prior", and the target, or "posterior" density, p(y n+1 \x n+1 )p(x n+1 \x n ), 

are nearly mutually singular. This can happen even in one dimensional 

problems, however the situation becomes more dramatic as the dimension 

m increases. A rigorous analysis of the asymptotic behavior of weights of 

the SIR filter (as the number of particles and the dimension go to infinity) 

is given in [2j[5j|3l] and it is shown that the number of particles required to 

avoid the collapse of the SIR filter grows exponentially with the variance of 

the observation log likelihood (the logarithm of the weights) . 



3.2 The optimal particle filter 

One can avoid the collapse of particle filters in low-dimensional problems 
by choosing the importance function wisely. If one chooses an importance 
function 7r so that the weights in Q are close to uniform, then all particles 
contribute equally to the empirical estimate they define. In |12|23 30 39 the 
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importance function 7r n+ i(x n+1 |a; 0: ' 1 , z 0:n+1 ) = p(x n+1 \x n , z n+1 ), is discussed 
and it is shown that this importance function is "optimal" in the sense that 
it minimizes the variance of the weights given the data and X™. For that 
reason, a filter that uses this importance function is called "optimal particle 
filter" and the optimal weights are 

W? +1 ocp(Z n+1 \Xf). (5) 

For the class of models and data we consider, the optimal particle filter is 
identical to the implicit particle filter [9,27 . The asymptotic behavior of 



the weights of the optimal particle filter was studied in [30] and it was found 
that the optimal filter collapses if the variance of the logarithm of its weights 
is large. 



4 The collapse and non-collapse of particle filters 



The conditions for the collapse have been reported in [2][5][3TJ for SIR and 
in |30| for the optimal particle filter; here we connect these to our analysis 
of effective dimension. 



4.1 The case of the optimal particle filter 

It was shown in [30], that the optimal particle filter collapses if the Frobe- 
nius norm of the covariance matrix of (HQH T + i?) °' 5 HAx n ~ l is large 
(asymptotically infinite as k — > oo). However if this Frobenius norm is 
small, then the variance of the logarithm of the weights is also small so that 
the optimal particle filter works just fine (i.e. it does not collapse). We now 
investigate the role the effective dimension of section [2] plays for the collapse 
of the optimal particle filter. 

If the conditional pdf has reached steady state, then the covariance of 
x n-i j g p ^Yie steady state solution of the Riccati equation), so that the 
Frobenius norm of the symmetric matrix 

S = HAPA T H T (HQH T + R) _1 , (6) 

governs the collapse of the optimal particle filter. If the Frobenius norm of 
X is well-bounded for large m and k, then the optimal particle filter will 
work. The boundedness of £ induces a balance condition between the errors 
in the model and in the data; the situation is analogous to what we observed 
in section [2j 
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To understand this balance condition better, we consider again the sim- 
ple example of section [2j i.e. we set H = A = I m and Q = ql m , R = rl m . 
We already computed P in section [2] and find from ^ that 



The boundedness of 



q 2 + 4qr — q 
2(q + r) 

0(1) thus induces the condition 
y/q 2 + kqr - q 1 



m 



2{q + r) 

With 771 fixed, the left-hand-side depends only on the ratio of the covariances 
of the noise in the model and in the data, so that the level sets are rays. 
In the center panel of figure [2j we superpose these rays, for which optimal 
particle filtering can be successful, with the (q,r)-region in which data as- 
similation is feasible (as computed in section [2]). The left panel of the figure 
shows what is in principle possible, for comparison. 



Data assimilation possible 
Data assimilation not possible 



I Optimal/implicit particle 
' filter works 



SIR filter works 




0.1 



0.2 



Figure 2: Covariance map for successful data assimilation (left panel), and 
two particle filtering algorithms (center and right panel). The broken ellipse 
in the right panel locates the area where the SIR filter works. 



We find that the optimal particle filter can successfully solve many of 
the data assimilation problems that are well defined (in the sense that one 
can expect reliable conclusions given model and data, see section [2]). The 
exception are problems for which q ~ r, i.e. the noise in the model and data 
are equally strong. 

Another way to see this is to set e = q/r so that the balance condition 
becomes 

Ve 2 + 4e - e 1 
2(1+1) ~ 7™' 
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which we solve for m and then plot the maximum dimension m as a function 
the ratio of the noise in the model and the noise in the data; all values smaller 
than this maximum dimension are shown in figure [3] as the light blue area. 
We conclude that the optimal particle filter works for high-dimensional data 




q/r 

Figure 3: Maximum dimension for two particle filters. 



assimilation problems if e is either small or large. The case of large e is the 
case typically encountered in practice. The reasons are as follows: if e is 
small, then the model is very accurate. In this case, neither accurate nor 
inaccurate data can improve the model predictions (this case corresponds 
to the vertical line in figure [2]), i.e. data assimilation is unnecessary since 
one can simply trust the predictions of the model ([I]). If e is large, then the 
uncertainty in the data is much less than the uncertainty in the model, i.e. 
we can learn a lot from the data. This is the interesting case and the optimal 
particle filter (or the implicit particle filter) can be expected to work in such 
scenarios. However, problems occur when e ~ 1 and if the state dimension 
exceeds m rj 20. We expect this case to occur infrequently, because typically 
the data are more accurate than the model. 

It is however important to realize that the collapse of the optimal par- 
ticle filter for e ~ 1 does not imply that Monte Carlo sampling in general 
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is not applicable in this case. Particle filtering induces variance into the 
weights because of its recursive problem formulation and this variance can 
be reduced by particle smoothing. The reason is as follows: the variance of 
the weights of the optimal particle filter depends only on the variance of the 
particles' positions at time n (see ([5])), i.e. each particle is updated to time 
n + 1 such that no additional variance is introduced (this is why this filter 
is called optimal); however the positions at time n may be unlikely in view 
of the data at n + 1 (due to accumulation of errors up until this point). In 
this case, one can go back and correct the past, i.e. use a particle smoother 
(see also section [5]). However, the number of steps one needs to go back in 
time for successful smoothing is problem dependent and, thus, we cannot 
provide a full analysis here (given that we work in a restrictive linear setting 
it seems more realistic and efficient to do this analysis on a case by case 



basis). In particular, it was indicated in two independent papers (37,38 
that smoothing a few steps backwards can help with making Monte Carlo 
sampling applicable in situations where particle filters fail. In [38], the low- 
noise regime (which is an instance of the case where e ~ 1) is considered in 
connection with an application in oceanography and it was found that par- 
ticle smoothing is helpful, however the approximations for optimal particle 
smoothers become difficult and computationally expensive as the problems 
get nonlinear. In [37| , particle smoothing was found to give superior results 
than particle filtering for combined parameter and state estimation, again 
in connection with an application in oceanography. 

In the general case (arbitrary A,H,Q,R), we can simplify the balance 
condition for successful particle filtering by using the upper bound for the 
Frobenius norm of £ : 

||S|| F < \\A\\ 2 F \\H\\ 2 F \\P\\ F \\ (HQH T + R)~ 1 \\ F . 

If we require that this upper bound is less than \fm, then we find, using the 
upper bound 

Vrn = \\I\\ F < || (HQH T + R) \\ F \\ (HQH T + R)' 1 \\ F , 

that 



F \\H\\ F \\P\\ F < \\H\\ F \\Q\\ F + \\R\\ F , 

is a sufficient condition that the Frobenius norm of £ is small (i.e. it grows 
slower than 0{y/rn)). As in section 2, we find that the balance condition in 
terms of ||-R||f and ||Q||f 5 is simple in simple cases, but delicate in general. 



16 



4.2 The case of the SIR filter 



The collapse of the SIR filter has been studied in [2||5 31 , and it was shown 
that, for a properly normalized model and data equation, this collapse is 
governed by the Frobenius norm of the covariance of Hx n ; undoing the 
scaling, and noting that x n_1 has covariance P (the steady state solution of 
the Riccati equation), we find that the Frobenius norm of 

£ = H(Q + APA T ) H T R- 1 . 

governs the collapse of SIR filters. Thus, the boundedness of is the 

balance condition for successful data assimilation with an SIR particle filter. 
For the simple example considered earlier {A = H = I m , Q = ql m , R = 
rl m ), this condition becomes 

a/(/ 2 + Aqr + q 1 



2r \/rn 



For m = 100, the (q,r)-region for which data assimilation with an SIR filter 
can be successful is plotted in the right panel of figure [2j We observe that 
this region is very small compared to the region that is accessible with an 
optimal particle filter. 

We can also set e = q/r and obtain 

Ve 2 + 4e + e 1 



m 



which we solve for m so that we can plot the maximum dimension for which 
SIR particle filtering can be successful as a function of the covariance ra- 
tio e (see figure [3]). Again, we observe that the SIR particle can only be 
useful in a limited class of problems. In particular, we find that the SIR 
particle filter works in high-dimensional problems only if the model is very 
accurate (compared to the data). However, we argued before that this case 
is somewhat unrealistic, since we expect that the errors in the model be 
typically larger than the errors in the data (or else the data are not very 
useful, or particle filtering unnecessary because the model is very good). In 
these realistic scenarios, the SIR particle filter collapses and we conclude 
that, as the dimension m increases, it becomes more and more important 
to use the optimal importance function or a good approximation of it (see 



e.g. [27,36-38 for approximations of the optimal filter). 



In the general case, we can use an upper bound, e.g. 

\\E\\ F < ||#||£||iT 1 || i r(||Q|| F + ||A|||,||P| 
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and if we require that this bound is less than \/rn, we obtain the simplified 
balance condition 

||H|||(||Qlb + ||^|||||P||)<||i?||F. 

The above condition implies that the Frobenius norm of the covariance ma- 
trix of the model noise, Q, must be much smaller than the Frobenius norm 
of the covariance matrix of the errors in the data, which is unrealistic. 



4.3 Discussion 

We wish to point out differences and similarities of our work and the asymp- 
totic studies in [^[5j[30}[3l]. Clearly, the results of |2[[5| [30t[31 are used in 



our analysis of the optimal particle filter (section 4.1) and the SIR filter 
(this section). Moreover, our analysis confirms Snyder's findings in |30| , 
where it was shown that the optimal particle filter "dramatically reduces 
the required sample size" (by lowering the exponent in the relation between 
the number of particles and the state dimension). In [2j[5j[30j[3l], it was 
shown that the number of particles required grows exponentially with the 
variance of the logarithm of the weights; the variance of the logarithm of the 
weights if governed by the Forbenius norms of covariance matrices (which 
are different for SIR and the optimal particle filter). Our main contribu- 
tion (in connection with particle filters) is to study the connection of these 
Frobenius norms with the effective dimension of section if the effective 
dimension is well-bounded then these Frobenius norms do not necessarily 
grow with the state dimension m. Thus, we can find conditions under which 
the SIR and optimal particle filters can work. We also explain the physical 
interpretation of our results and conclude that the optimal particle filter can 
work for many realistic (and large dimensional) problems, because realistic 
conditions imply a moderate effective dimension. 



5 Particle smoothing and variational data assimi- 
lation 

We now consider the role of the effective dimension in particle smoothing 
and variational data assimilation. The idea here is to replace the step-by- 
step construction of the conditional pdf in a particle filter (or Kalman filter) 
by direct sampling of the full pdf p(x 0:n \z 1:n ), i.e. all available data are 
assimilated in one sweep. Particle smoothers apply importance sampling to 
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obtain weighted samples from this pdf, and in variational data assimilation 
one estimates the state of the system by the mode of this pdf. 

It is clear that either method can only be successful if the Frobenius 
norm of the covariance matrix of the variables conditioned on the data is 
small, or else the samples of numerical or physical experiments collect on 
a thin shell far from the most likely state (to obtain this result, one has 
to repeat the steps in section 2). We now determine the conditions under 
which this Frobenius norm is small. As is customary in data assimilation, we 
distinguish between the "strong constraint" and "weak constraint" problem. 

5.1 The strong constraint problem 

In the strong constraint problem one considers a "perfect model", i.e. the 
model errors are neglected and we set Q = (see e.g. |33|). Since the 
initial conditions determine the state trajectory, the goal is to obtain initial 
conditions that are compatible with the data, i.e. we are interested in the 
pdf 

p{x°\z l -- n ) ocexp (x° - ijl ) T So 1 {x° - fi ) 

(1 n 
— {z j - HAix°) T R- 1 (z j - HA^x ) 

Straightforward calculation shows that this pdf is Gaussian (under our as- 
sumptions) and its covariance matrix is 

n 

5T 1 = E 1 + Y,( Aj ) T H T R' lHAj - 

i=i 

As explained above, data assimilation for the Gaussian model makes sense 
only if the Frobenius norm of this matrix is small (or at least, if it does not 
grow with the state dimension). In this case, the samples collect on a small 
and low-dimensional ball, close to the most likely state. The boundedness 
of the Frobenius norm of E induces a balance condition between the prior 
errors (So) and the errors on the data (R). The situation is analogous to the 
balance conditions we encountered before in sequential data assimilation. 

We illustrate the balance condition for the strong constraint problem 
by considering a version of the simple example we used earlier, i.e. we set 
A = H = I m , Q = 0, R = rl m , and, in addition, n = 1, Sq = ool m . In this 
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case, we can compute £ and its Frobenius norm: 




Figure 4: Covariance map for strong 4D-Var and optimal particle smoothing. 



given state dimension, the values of r and <jq below the corresponding curve 
lead to a small ||£||f- We observe that, for a fixed m, a larger error in the 
prior knowledge (ctq) can be tolerated if the error in the data is very small 
and vice versa. Similar observations were made in 16 , 17 in connection with 
the condition number in 3D-Var. 

Variational data assimilation (strong 4D-Var) represents the conditional 
pdf by its mode, i.e. by a single point in the state space. The smaller is 
the ball on which the samples collect (i.e. the smaller the Frobenius norm 
of £), the more applicable is strong 4D-Var. Particle smoothers on the 
other hand construct an empirical estimate of the pdf via sampling. Since 
the target pdf is Gaussian, we can construct an optimal particle smoother 
(minimum variance in the weights) by sampling this Gaussian, so that the 
weights are constant (zero variance). It is clear that, for realistic conditions 
(small | |S| \p) the optimal particle smoother can be expected to perform well, 
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regardless of the state dimension m, because it can efficiently represent the 
pdf one is interested in. 

The situation is different for other particle smoothers. Consider, for 
example, the SIR-like particle smoother that uses p(xo) as its importance 
function. This filter produces weights whose negative logarithm is given by 

^ = \ E ( ZJ - HAix^R- 1 (Zi - HAix ) . 

3=1 

For n = 1 , the variance of these weights depends on the Frobenius norm of 
the matrix H AY*qA t H T R -1 , which has the upper bound 

\\HAE A T H T R- 1 \\ < lltfll^llAII^HSollFll^ 1 !!. 

If we require that this upper bound is less than y/m then we obtain (using 
\fm < the condition 

||fr|||.||A|||.||Eo||F<||12||, 

which implies that the errors before we collect the data must be smaller 
than the errors in the data, which is unrealistic. In particular, for the simple 
example considered above we find that do < r/y/m. We conclude that, as 
in particle filtering, particle smoothing is possible under realistic conditions 
only if the importance function is chosen carefully. 

Note that the results we obtained here are different than those we would 
obtain if would simply put Q = in the Kalman filter formulas of section 2. 
It is easy to show that for Q = the steady state covariance matrix converges 
to the zero matrix. What this means is that, with enough data, one can wait 
for steady state, and then accurately estimate the state at large n. What we 
have done in this section is to consider the consequences of having access to 
only a finite data set, i.e. making predictions before steady state is reached. 

Finally, note that, in contrast to the sequential problem, the minimum 
variance of the weights of the smoothing problem is zero, whereas particle 
filters always produce non-zero variance weights. This variance is induced by 
the factorization of the importance function it, and since this factorization 
is not required in particle smoothing, this source of variance can disappear 
(or be reduced) by clever choice of importance functions. 
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5.2 The weak constraint problem 

In the weak constraint problem (see e.g. [4j), one is interested in estimating 
the full state trajectory given the data, i.e. in the pdf 



p{x Q -- n \z v - n ) ocexp (-1 (x° - /i ) T S - 1 (x° - Mo) 

1 n 

( zj - Hx j f R- 1 (z j - Hx j ) 



x exp 



2 

3=1 



An easy calculation reveals that this pdf is Gaussian and its covariance 
matrix is 

/ S^+AT^A -ATQ- 1 ■■■ \ 

-Q~ l A Q-'L+ATQ^A + HTR- 1 !! -A T Q~ l 

>] 1 

: -A T Q-! 
\ ■■■ -Q~ 1 A Q- 1 +H T R' 1 H J 

For the same arguments as before, data assimilation can only be successful 
if the Frobenius norm of E is moderate. This implies (again) a delicate 
balance condition between the errors in the prior knowledge (||So||f)> the 
errors in the model ([!]) (||<3||f) and the errors in the data Q (||i?||_p). 

As in the strong constraint problem, variational data assimilation (weak 
4D-Var) represents this pdf by its mode (a single point) and this approx- 
imation is the more applicable, the smaller the Frobenius norm of S is. 
An optimal particle smoother can be constructed for this problem by sam- 
pling directly (zero variance weights) the Gaussian conditional pdf. For the 
same reasons as in the previous section, we can expect an optimal parti- 
cle smoother to perform well under realistic conditions, but also can expect 
difficulties if the choice of importance function is poor. 



6 Limitations of the analysis 

We wish to point out limitations of the analysis above. To find the conditions 
for successful data assimilation we study the conditional pdf and we rely on 
the Kalman formalism to compute it. Since the Kalman formalism is only 
applicable to linear Gaussian problems, our results are at best indicative of 
the general nonlinear/non-Gaussian case. However, we believe that the gen- 
eral idea of that the probability mass must concentrate on a low-dimensional 
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manifold holds in the nonlinear case as well. Since Khinchin's theorem is 
independent of our linearity assumption, and since we expect that correla- 
tions amongst the errors also occur in nonlinear models, one can speculate 
that the probability mass does collect on a low-dimensional manifold (un- 
der realistic assumptions on the noise). However finding (or describing) this 
manifold in general becomes exceedingly difficult and is perhaps best done 
on a case-by-case basis in which special features, or structures, in the model 
at hand can be exploited. 

We have further assumed that all model parameters, including the co- 
variances of the errors in the model and data equations, are known. If these 
must be estimated simultaneously (combined parameter and state estima- 
tion), then the situation becomes far more difficult, even in the case of a 
linear model equation ([I]) and data stream Q. It seems reasonable that 
estimating parameters using data at several consecutive time points (as is 
done implicitly in some versions of variational data assimilation or particle 
smoothing) would help with the parameter estimation problem and perhaps 
even with model specification. 

Concerning particle filters, we have examined in detail only two choices 
of importance function, the one in SIR, where the samples are chosen inde- 
pendently of the data, and, at the other extreme, one where the choice of 
samples depends strongly on the data. There is a large literature on impor- 
tance functions, see [9 p0t|T2p7p5}|38] ; it is quite possible that other choices 
can outperform the optimal/implicit particle filter even in the present lin- 
ear synchronous case once computational costs are taken into account. In 
nonlinear problems the optimal particle filter is hard to implement and the 
implicit particle filter is suboptimal, so further analysis may be needed to see 
what is optimal in each particular case (see also |36[|38| for approximations 
of the optimal filter). 

More broadly, the analysis of particle filters in the present paper is not 
robust as assumptions change. For example, if the model noise is multiplica- 
tive (i.e. the covariance matrices are state dependent), then our analysis does 
not hold, not even for the linear case. Moreover, the optimal particle filter 
becomes very difficult to implement, whereas the SIR filter remains easy to 
use. Similarly, if model parameters (the elements of A or the covariances Q 
and R) are not known, simultaneous state and parameter estimation using 
an optimal particle filter becomes difficult, but SIR, again, remains easy 
to use. While the filters may not collapse in these cases, they may give a 
poor prediction. The existence of such important departures is confirmed 
by the fact that the ensemble Kalman filter and square root filter differ sub- 
stantially in their performance. However, our analysis indicates that, if (fij) 
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and ([2]) hold, the ensemble Kalman filter, the Kalman filter and the optimal 
particle filter are equivalent in the non-collapse region of the optimal filter. 

Similarly, variational data assimilation or particle smoothing can be suc- 
cessful (and is indeed equivalent to Kalman filtering) if ([!]) and ^ hold. 
We expect that variational data assimilation and particle smoothing can be 
successful in the nonlinear case, provided that the probability mass concen- 
trates on a low-dimensional manifold. In particular, particle smoothing has 
the potential of extending the applicability of Monte Carlo sampling to data 
assimilation, since the variance of weights due to the sequential problem 
formulation in particle filters is reduced (the data at time 2 may label what 
one thought was likely at time 1 as unlikely). This statement is perhaps 
corroborated by the success of variational data assimilation in numerical 
weather prediction. 

Finally, it should be pointed out that we assumed throughout the paper 
that the model and data equations are "good" , i.e. that the model and data 
equations are capable of describing the physical situation one is interested 
in. It seems difficult in theory and practice to study the case where the 
model and data equations are incompatible with the (real) data one has 
collected. For example, it is unclear to us what happens if the covariances 
of the errors in the model and data equations are systematically under- or 
overestimated, i.e. if the various data assimilation algorithms work with 
"wrong" covariances. 

7 Conclusions 

We have investigated the conditions under which data assimilation is feasi- 
ble, regardless of the algorithm used to do the assimilation. We quantified 
these conditions by defining an effective dimension of a Gaussian data as- 
similation problem and have shown that the boundedness of the effective 
dimension induces a balance condition for the errors in the model and data. 
This condition must be satisfied or else one cannot reach reliable conclu- 
sions about the process one is modeling, even when the (linear) model is 
completely correct. The balance condition is often satisfied for realistic 
models, i.e. the effective dimension is moderate, even if the state dimension 
is large. 

The analysis was carried out in the linear synchronous case, where it can 
be done in some generality; we believe that this analysis captures the main 
features of the general case, but we have also discussed the limitations of 
the analysis. 
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Building on the results in [2j[5j[30j[3T], we studied the effects of the 
effective dimension on particle filters in two instances, one in which the 
importance function is based on the model alone, and one in which it is 
based on both the model and the data. We have three main conclusions: 

1. The stability (i.e., non-collapse of weights) in particle filtering depends 
on the effective dimension of the problem. Particle filters can work well 
if the effective dimension is small even if the true dimension is large 
(which we expect to happen often in practice). 

2. A suitable choice of importance function is essential, or else particle 
filtering fails even when data assimilation is (in principle) possible with 
a sequential algorithm. 

3. There is a parameter range in which the model noise and the obser- 
vation noise are roughly comparable, and in which even the optimal 
particle filter collapses, even under ideal circumstances. 

We have then studied the role of the effective dimension in variational 
data assimilation and particle smoothing, for both the weak and strong con- 
straint problem. It was found that these methods too require a moderate 
effective dimension or else no accurate predictions can be expected. More- 
over, variational data assimilation or particle smoothing may be applicable 
in the parameter range where particle filtering fails, because the use of more 
than one consecutive data set helps reduce the variance which is responsible 
for the collapse of the filters. 

These conclusions are predicated on the linearity of the model and data 
equations, and on the assumption that the generative and data models are 
close enough to reality. 
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